Search CORE

31 research outputs found

MonetDB/X100 - A DBMS in the CPU cache

Author: Boncz P.A. (Peter)
Héman S. (Sándor)
Nes N.J. (Niels)
Zukowski M. (Marcin)
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 01/06/2005
Field of study

X100 is a new execution engine for the MonetDB system, that improves execution speed and overcomes its main memory limitation. It introduces t

CWI's Institutional Repository

Positional Delta Trees to reconcile updates with read-optimized data storage

Author: Boncz P.A. (Peter)
Héman S. (Sándor)
Nes N.J. (Niels)
Zukowski M. (Marcin)
Publication venue: CWI
Publication date: 01/08/2008
Field of study

We investigate techniques that marry the high readonly analytical query performance of compressed, replicated column storage (“read-optimized” databases) with the ability to handle a high-throughput update workload. Today’s large RAM sizes and the growing gap between sequential vs. random IO disk throughput, bring this once elusive goal in reach, as it has become possible to buffer enough updates in memory to allow background migration of these updates to disk, where efficient sequential IO is amortized among many updates. Our key goal is that read-only queries always see the latest database state, yet are not (significantly) slowed down by the update processing. To this end, we propose the Positional Delta Tree (PDT), that is designed to minimize the overhead of on-the-fly merging of differential updates into (index) scans on stale disk-based data. We describe the PDT data structure and its basic operations (lookup, insert, delete, modify) and provide an in-detail study of their performance. Further, we propose a storage architecture called Replicated Mirrors, that replicates tables in multiple orders, storing each table copy mirrored in both column- and row-wise data formats, and uses PDTs to handle updates. Experiments in the MonetDB/X100 system show that this integrated architecture is able to achieve our main goals

CWI's Institutional Repository

Super-scalar RAM-CPU cache compression

Author: Boncz P.A. (Peter)
Héman S. (Sándor)
Nes N.J. (Niels)
Zukowski M. (Marcin)
Publication venue: CWI
Publication date: 01/01/2005
Field of study

High-performance data-intensive query processing tasks like OLAP, data mining or scientific data analysis can be severely I/O bound, even when high-e

CWI's Institutional Repository

Super-Scalar RAM-CPU Cache Compression

Author: Boncz P.A. (Peter)
Héman S. (Sándor)
Nes N.J. (Niels)
Zukowski M. (Marcin)
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 01/01/2006
Field of study

High-performance data-intensive query processing tasks like OLAP, data mining or scientific data analysis can be severely I/O bound, even whe

CWI's Institutional Repository

Flexible and efficient IR using array databases

Author: Boncz P.A. (Peter)
Cornacchia R. (Roberto)
Héman S. (Sándor)
Vries A.P. (Arjen) de
Zukowski M. (Marcin)
Publication venue: 'Springer Fachmedien Wiesbaden GmbH'
Publication date: 01/01/2008
Field of study

The Matrix Framework is a recent proposal by IR researchers to flexibly represent all important information retrieval models in a single multi-dimensional array framework. Computational support for exactly this framework is provided by the array database system SRAM (Sparse Relational Array Mapping) that works on top of a DBMS. Information retrieval models can be specified in its comprehension-based array query language, in a way that directly corresponds to the underlying mathematical formulas. SRAM efficiently stores sparse arrays in (compressed) relational tables and translates and optimizes array queries into relational queries. In this work, we describe a number of array query optimization rules and demonstrate their effect on text retrieval in the TREC TeraByte track (TREC-TB) efficiency task, using the Okapi BM25 model as our example. It turns out that these optimization rules enable SRAM to automatically translate the BM25 array queries into the relational equivalent of inverted list processing including compression, score materialization and quantization, such as employed by custom-built IR systems. The use of the high-performance MonetDB/X100 relational backend, that provides transparent database compression, allows the system to achieve very fast response times with good precision and low resource usage

CWI's Institutional Repository

From Cooperative Scans to Predictive Buffer Management

Author: Chen C.-M.
Chou H.-T.
Faloutsos C.
Giannikis G.
Graefe G.
Harizopoulos S.
Héman S.
Lang C. A.
Lang C. A.
Roy B. V.
Teorey T. J.
Unterbrunner P.
Zukowski M.
Zukowski M.
Zukowski M.
Publication venue
Publication date: 01/01/2012
Field of study

In analytical applications, database systems often need to sustain workloads with multiple concurrent scans hitting the same table. The Cooperative Scans (CScans) framework, which introduces an Active Buffer Manager (ABM) component into the database architecture, has been the most effective and elaborate response to this problem, and was initially developed in the X100 research prototype. We now report on the the experiences of integrating Cooperative Scans into its industrial-strength successor, the Vectorwise database product. During this implementation we invented a simpler optimization of concurrent scan buffer management, called Predictive Buffer Management (PBM). PBM is based on the observation that in a workload with long-running scans, the buffer manager has quite a bit of information on the workload in the immediate future, such that an approximation of the ideal OPT algorithm becomes feasible. In the evaluation on both synthetic benchmarks as well as a TPC-H throughput run we compare the benefits of naive buffer management (LRU) versus CScans, PBM and OPT; showing that PBM achieves benefits close to Cooperative Scans, while incurring much lower architectural impact.Comment: VLDB201

arXiv.org e-Print Archive

VU Research Portal

Crossref

CWI's Institutional Repository

Promovendus maakt verwerking Big Data stukken sneller - Automatiseringsgids.nl

Author: CWI CWI
Héman S. (Sándor)
Publication venue
Publication date: 27/10/2015
Field of study

CWI's Institutional Repository

Super-Scalar RAM-CPU Cache Compression

Author: M. Zukowski
Marcin Zukowski
N. Nes
Niels Nes
P. A. Boncz
Peter Boncz
S. Héman
Sándor Héman
Publication venue: IEEE Computer Society
Publication date
Field of study

CWI is a founding member of ERCIM, the European Research Consortium for Informatics and Mathematics. CWI's research has a theme-oriented structure and is grouped into four clusters. Listed below are the names of the clusters and in parentheses their acronyms

CiteSeerX

VectorWise

Author: Boncz P.A. (Peter)
Héman S. (Sándor)
Zukowski M. (Marcin)
Publication venue
Publication date: 01/01/2008
Field of study

VectorWise (2008) is based on scientific research results and derives its strength from a completely new approach on data processing. The approach makes use of vector processing on data sets in which every vector is tailored to the size of the cache memory of modern processors.The vectorized execution model also allows exploiting SIMD features (such as SSE in Intel CPUs) as well as multi-core technology, and is complemented by innovations to help optimize high-bandwidth disk I/O. The new method is beneficial for existing database applications and allows organizations to perform data analysis tasks that were previously not feasible. Applications are not restricted to businesses, as increasingly advances in logistics, science, medicine and healthcare depend on the analysis of very large data volumes. The CWI spin-off VectorWise that was created around this software, was sold in 2011 to Actian Corporation.</p

CWI's Institutional Repository

INformation Systems Flexible and efficient IR using array databases

Author: A. P. De Vries
M. Zukowski
P. A. Boncz
R. Cornacchia
S. Héman
Publication venue
Publication date
Field of study

CiteSeerX